Home
AdvExplore
AdvGLUE
AdvGLUE
The Adversarial GLUE Benchmark
Performance of SMART_RoBERTa (single model) on AdvGLUE
Overall Statistics
96.6
Accuracy
57.0
45.8
39.9
50.9
91.2
88.4
F1
Accuracy
63.4
48.0
54.1
19.0
73.8
32.0
64.2
44.3
95.0
Accuracy
66.9
43.4
27.7
52.2
91.0
Accuracy
73.8
66.2
70.4
90.8
Accuracy
49.8
39.1
45.6
90.7
0
100
Accuracy
51.5
0
100
33.1
0
100
22.0
0
100
36.1
0
100
GLUE Dev
AdvGLUE Word
AdvGLUE Sentence
AdvGLUE Human
AdvGLUE Overall
SST-2
QQP
QNLI
RTE
MNLI-m
MNLI-mm
plotly-logomark
Performance of SMART_RoBERTa (single model) on each task
The Stanford Sentiment Treebank (SST-2)
56.2
54.9
60.8
58.9
52.8
Typo
Knowledge
Embedding
Context
Composition
34.7
62.6
Syntactic
Distraction
39.9
0
100
CheckList
Adversarial Acc
Word
Sentence
Human
plotly-logomark
Quora Question Pairs (QQP)
56.0
82.4
69.0
68.6
62.3
Typo
Knowledge
Embedding
Context
Composition
42.1
66.7
38.1
35.3
52.9
54.1
Syntactic
19.0
73.8
0
100
CheckList
32.0
0
100
Adversarial Acc
Adversarial F1
Word
Sentence
Human
plotly-logomark
MultiNLI (MNLI) matched
61.1
50.0
57.1
56.3
42.0
Typo
Knowledge
Embedding
Context
Composition
36.2
44.4
0
100
Syntactic
Distraction
Adversarial Acc
Word
Sentence
plotly-logomark
MultiNLI (MNLI) mismatched
35.1
60.3
74.1
60.3
49.2
Typo
Knowledge
Embedding
Context
Composition
27.4
43.1
Syntactic
Distraction
15.5
28.4
0
100
StressTest
ANLI
Adversarial Acc
Word
Sentence
Human
plotly-logomark
Question NLI (QNLI)
70.8
63.4
60.3
66.3
70.5
Typo
Knowledge
Embedding
Context
Composition
33.8
56.6
Syntactic
Distraction
31.6
25.0
0
100
CheckList
AdvSQuAD
Adversarial Acc
Word
Sentence
Human
plotly-logomark
Recognizing Textual Entailment (RTE)
82.6
77.4
74.4
77.8
63.6
Typo
Knowledge
Embedding
Context
Composition
60.2
77.1
0
100
Syntactic
Distraction
Adversarial Acc
Word
Sentence
plotly-logomark
AdvGLUE
UIUC Secure Learning Lab
Microsoft Research